Search CORE

69 research outputs found

PA-Tree: A Parametric Indexing Scheme for Spatio-temporal Trajectories

Author: D. Pfoser
G. Kollios
G. Kollios
J.C. Mason
K. Porkaew
M. Hadjieleftheriou
M.A. Nascimento
T. Brinkhoff
Y. Cai
Y. Tao
Y. Theodoridis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Abstract. Many new applications involving moving objects require the collec-tion and querying of trajectory data, so efficient indexing methods are needed to support complex spatio-temporal queries on such data. Current work in this domain has used MBRs to approximate trajectories, which fail to capture some basic properties of trajectories, including smoothness and lack of internal area. This mismatch leads to poor pruning when such indices are used. In this work, we revisit the issue of using parametric space indexing for historical trajectory data. We approximate a sequence of movement functions with single continuous polynomial. Since trajectories tend to be smooth, our approximations work well and yield much finer approximation quality than MBRs. We present the PA-tree, a parametric index that uses this new approximation method. Experiments show that PA-tree construction costs are orders of magnitude lower than that of com-peting methods. Further, for spatio-temporal range queries, MBR-based methods require 20%–60 % more I/O than PA-trees with clustered indicies, and 300%– 400 % more I/O than PA-trees with non-clustered indicies.

CiteSeerX

Crossref

Fault-Tolerant Aggregation: Flow-Updating Meets Mass-Distribution

Author: A. Sinclair
A.G. Dimakis
C. Intanagonwiwat
D.R. Kowalski
G. Kollios
I.F. Akyildiz
J.-Y. Chen
J.-Y. Chen
J.-Y. Chen
L. Gasieniec
L. Xiao
M. Jelasity
P. Erdos
P. Jesus
R. Olfati-Saber
S. Boyd
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) protocols, are not resilient to communication failures (or message loss) because such failures yield a loss of mass. In this paper, we present a protocol which we call Mass-Distribution with Flow-Updating (MDFU). We obtain MDFU by applying FU techniques to classic MD. We analyze the convergence time of MDFU showing that stochastic message loss produces low overhead. This is the first convergence proof of an FU-based algorithm. We evaluate MDFU experimentally, comparing it with previous MD and FU protocols, and verifying the behavior predicted by the analysis. Finally, given that MDFU incurs a fixed deviation proportional to the message-loss rate, we adjust the accuracy of MDFU heuristically in a new protocol called MDFU with Linear Prediction (MDFU-LP). The evaluation shows that both MDFU and MDFU-LP behave very well in practice, even under high rates of message loss and even changing the input values dynamically.Comment: 18 pages, 5 figures, To appear in OPODIS 201

arXiv.org e-Print Archive

CiteSeerX

Universidade do Minho: RepositoriUM

Crossref

Risk-Averse Matchings over Uncertain Graph Databases

Author: A Khan
AE Roth
B Bollobás
D Liben-Nowell
G Kollios
J Edmonds
LG Valiant
M Kargar
M Kearns
M Potamias
N Bansal
N Chen
NJ Krogan
NN Dalvi
P Berman
P Boldi
RM Karp
S Asthana
YH Chan
Publication venue
Publication date: 09/01/2018
Field of study

A large number of applications such as querying sensor networks, and analyzing protein-protein interaction (PPI) networks, rely on mining uncertain graph and hypergraph databases. In this work we study the following problem: given an uncertain, weighted (hyper)graph, how can we efficiently find a (hyper)matching with high expected reward, and low risk? This problem naturally arises in the context of several important applications, such as online dating, kidney exchanges, and team formation. We introduce a novel formulation for finding matchings with maximum expected reward and bounded risk under a general model of uncertain weighted (hyper)graphs that we introduce in this work. Our model generalizes probabilistic models used in prior work, and captures both continuous and discrete probability distributions, thus allowing to handle privacy related applications that inject appropriately distributed noise to (hyper)edge weights. Given that our optimization problem is NP-hard, we turn our attention to designing efficient approximation algorithms. For the case of uncertain weighted graphs, we provide a

\frac{1}{3}

-approximation algorithm, and a

\frac{1}{5}

-approximation algorithm with near optimal run time. For the case of uncertain weighted hypergraphs, we provide a

\Omega(\frac{1}{k})

-approximation algorithm, where

k

is the rank of the hypergraph (i.e., any hyperedge includes at most

k

nodes), that runs in almost (modulo log factors) linear time. We complement our theoretical results by testing our approximation algorithms on a wide variety of synthetic experiments, where we observe in a controlled setting interesting findings on the trade-off between reward, and risk. We also provide an application of our formulation for providing recommendations of teams that are likely to collaborate, and have high impact.Comment: 25 page

arXiv.org e-Print Archive

Crossref

Finding Anomalous Periodic Time Series: An Application to Catalogs of Periodic Variable Stars

Author: A. Hewish
A. P. Dempster
C. Sterken
Carla E. Brodley
Charles Alcock
D. Hawkins
D. L. Pollacco
D. Yu
G. Kollios
G. Richter
J. Yang
M. Petit
M. Schmidt
N. N. Samus’
P. Protopapas
Pavlos Protopapas
R. W. Klebesadel
S. Gaffney
S. Mallat
Umaa Rebbapragada
V. Barnett
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/05/2009
Field of study

Catalogs of periodic variable stars contain large numbers of periodic light-curves (photometric time series data from the astrophysics domain). Separating anomalous objects from well-known classes is an important step towards the discovery of new classes of astronomical objects. Most anomaly detection methods for time series data assume either a single continuous time series or a set of time series whose periods are aligned. Light-curve data precludes the use of these methods as the periods of any given pair of light-curves may be out of sync. One may use an existing anomaly detection method if, prior to similarity calculation, one performs the costly act of aligning two light-curves, an operation that scales poorly to massive data sets. This paper presents PCAD, an unsupervised anomaly detection method for large sets of unsynchronized periodic time-series data, that outputs a ranked list of both global and local anomalies. It calculates its anomaly score for each light-curve in relation to a set of centroids produced by a modified k-means clustering algorithm. Our method is able to scale to large data sets through the use of sampling. We validate our method on both light-curve data and other time series data sets. We demonstrate its effectiveness at finding known anomalies, and discuss the effect of sample size and number of centroids on our results. We compare our method to naive solutions and existing time series anomaly detection methods for unphased data, and show that PCAD's reported anomalies are comparable to or better than all other methods. Finally, astrophysicists on our team have verified that PCAD finds true anomalies that might be indicative of novel astrophysical phenomena

arXiv.org e-Print Archive

Crossref

Indexing Moving Objects Using Short-Lived Throwaway Indexes

Author: B. Cui
D. Hilbert
D.E. Knuth
D.G. Severance
D.V. Kalashnikov
G. Kollios
H. Tropf
J.A. Orenstein
L. Arge
M. Pelanis
P. Muth
T. Brinkhoff
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref

Median Trajectories

Author: A.F. Stappen van der
Carola Wenk
D. Eppstein
D. Halperin
E.W. Chambers
F.Y.L. Chin
G. Kollios
G. Tóth
H. Alt
H. Edelsbrunner
J. Gudmundsson
J. Hershberger
J. Hershberger
J. Lee
J.-G. Lee
J.R. Munkres
K. Buchin
K. Kedem
Kevin Buchin
Lionov Wiratma
M.A. Armstrong
Maarten Löffler
Maike Buchin
Marc van Kreveld
N. Amenta
O. Aichholzer
P. Laube
P.K. Agarwal
P.K. Agarwal
P.K. Agarwal
Rodrigo I. Silveira
S. Cabello
S. Dodge
S. Durocher
S. Durocher
S. Durocher
S. Gaffney
S. Har-Peled
T.K. Dey
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Mining poly-regions in DNA

Author: Benson G.
Kollios G.
Papapetrou Panagiotis
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2012
Field of study

We study the problem of mining poly-regions in DNA. A poly-region is defined as a bursty DNA area, i.e., area of elevated frequency of a DNA pattern. We introduce a general formulation that covers a range of meaningful types of poly-regions and develop three efficient detection methods. The first applies recursive segmentation and is entropy-based. The second uses a set of sliding windows that summarize each sequence segment using several statistics. Finally, the third employs a technique based on majority vote. The proposed algorithms are tested on DNA sequences of four different organisms in terms of recall and runtime

Birkbeck Institutional Research Online

Self-tuning management of update-intensive multidimensional data in clusters of workstations

Author: Kriakov V. Kollios, G. Delis, A.
Publication venue
Publication date: 01/01/2009
Field of study

Contemporary applications continuously modify large volumes of multidimensional data that must be accessed efficiently and, more importantly, must be updated in a timely manner. Single-server storage approaches are insufficient when managing such volumes of data, while the high frequency of data modification render classical indexing methods inefficient. To address these two problems we introduce a distributed storage manager for multidimensional data based on a Cluster-of-Workstations. The manager addresses the above challenges through a set of mechanisms that, through selective on-line data reorganization, collectively maintain a balanced load across a cluster of workstations. With the help of both a highly efficient and speedy self-tuning mechanism, based on a new data structure called stat-index, as well as a query aggregation and clustering algorithm, our storage manager attains short query response times even in the presence of massive modifications and highly skewed access patterns. Furthermore, we provide a data migration cost model used to determine the best data redistribution strategy. Through extensive experimentation with our prototype, we establish that our storage manager can sustain significant update rates with minimal overhead. © 2009 Springer-Verlag

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Management of highly dynamic multidimensional data in a cluster of workstations

Author: Kriakov V. Delis, A. Kollios, G.
Publication venue
Publication date: 01/01/2004
Field of study

Due to the proliferation and widespread use of mobile devices and satellite based sensors there has been increased interest in storing and managing spatio-temporal and sensory data. It has been recognized that centralized and monolithic index structures are not scalable enough to address the highly dynamic nature (high update rates) and the unpredictable access patterns in such datasets. In this paper, we propose an adaptive networked index method designed to address the above challenges. Our method not only facilitates fast query and update response times via dynamic data partitioning but is also able to self-tune highly loaded sites. Our contributions consist of techniques that offer dynamic load balancing of computing sites, non-disruptive on-the-fly addition/removal of storing sites, distributed collaborative decision making for the self-administering of the manager, and statistics-based data reorganization. These features are incorporated into a distributed software layer prototype used to evaluate the design choices made. Our experimentation compares the performance of a baseline configuration with our multi-site system, examines the attained speed-up as a function of the sites participating, investigates the effect of data reorganization on query/update response times, asserts the effectiveness of our proposed dynamic load balancing method, and examines the behavior of the system under diverse types of multi-dimensional data. Keywords: Data Management in Cluster of Workstations, Networked Storage Manager, Self-tuning Storage Nodes, and Multi-dimensional Data. © Springer-Verlag 2004

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens